Zen and the Art of Symbolic Computing: Light and Fast Applicative Algorithms for Computational Linguistics

نویسنده

  • Gérard P. Huet
چکیده

Computational linguistics is an application of computer science which presents interesting challenges from the programming methodology point of view. Developing a realistic platform for the treatment of a natural language in its phonological, morphological, syntactic, and ultimately semantic aspects demands a principled modular architecture with complex cooperation between the various layers. Representing large lexical data bases, treating sophisticated phonological and morphological transformations, and processing in real time large corpuses demands fast finite-state methods toolkits. Analysing the syntactic structure, computing anaphoric relations, and dealing with the representation of information flow in dialogue understanding, demands the processing of complex constraints on graph structures, with sophisticated sharing of large nondeterministic search spaces. The talk reports on experiments in using declarative programming for the processing of the sanskrit language, in its phonological and morphological aspects. A lexicon-based morphological tagger has been designed, using an original algorithm for the analysis of euphony (the so-called sandhi process, which glues together the words of a sentence in a continuous stream of phonemes). This work, described in [2], has been implemented in a purely applicative core subset of Objective Caml [5]. The basic structures underlying this methodology have been abstracted in the Zen toolkit, distributed as free software [3]. Two complementary techniques have been put to use. Firstly, we advocate the systematic use of zippers [1] for the programming of mutable data structures in an applicative way. Zippers, or linear contexts, are related to the interaction combinators of linear logic. Secondly, a sharing functor allows the uniform minimisation of inductive data structures by representing them as shared dags. This is similar to the traditional technique of bottom-up hashing, but the computation of the keys is left to the client invoking the functor, which has two advantages: keys are computed along with the bottom-up traversal of the structure, and more importantly their computation may profit of specific statistical properties of the data at hand, optimising the buckets balancing in ways which would be unattainable by generic functions. These two complementary technologies are discussed in [4]. The talk discusses the use of these tools in the uniform representation of finite state automata and transducers as decorated lexical trees (also

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear Contexts and the Sharing Functor: Techniques for Symbolic Computation

We present in this paper two design issues concerning fundamental representation structures for symbolic and logic computations. The first one concerns structured editing, or more generally the possibly destructive update of tree-like data-structures of inductive types. Instead of the standard implementation of mutable data structures containing references, we advocate the zipper technology, fu...

متن کامل

The Treatment of Japanese Garden based on Zen Philosophy in Mental Health

By designing conditions and relaxing spaces with the help of values and concepts of Japanesearchitecture, based on Zen's philosophy that based on simplicity, purity, avoidance of complexity and relaxation, ithas tried to bring an experience of relaxation to the audience. This article seeks to investigate a principled relationshipbetween the principles of Japanese gardens in accordance with the ...

متن کامل

Algorithms for Computing Limit distributions of Oscillating Systems with Finite Capacity

We address the batch arrival  systems with finite capacity under partial batch acceptance strategy where service times or rates oscillate between two forms according to the evolution of the number of customers in the system. Applying the theory of Markov regenerative processes and resorting to Markov chain embedding, we present a new algorithm for computing limit distributions of the number cus...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Sweep Line Algorithm for Convex Hull Revisited

Convex hull of some given points is the intersection of all convex sets containing them. It is used as primary structure in many other problems in computational geometry and other areas like image processing, model identification, geographical data systems, and triangular computation of a set of points and so on. Computing the convex hull of a set of point is one of the most fundamental and imp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003